Computer vision gesture based control of a device

ABSTRACT

A system and method are provided for controlling a device based on computer vision. Embodiments of the system and method of the invention are based on receiving a sequence of images of a field of view; detecting movement of at least one object in the images; applying a shape recognition algorithm on the at least one moving object; confirming that the object is a user hand by combining information from at least two images of the object; and tracking the object to control the device.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.13/499,929, filed on Apr. 3, 2012, which is a National Phase Applicationof PCT International Application No. PCT/IL2010/000837, InternationalFiling Date Oct. 13, 2010, entitled “Computer Vision Gesture BasedControl Of A Device”, published on Apr. 21, 2011 as InternationalPublication Number WO 2011/045789, claiming the benefit of U.S.Provisional Patent Application No. 61/250,953, filed Oct. 13, 2009, U.S.Provisional Patent Application No. 61/315,025, filed Mar. 18, 2010 andGreat Britain Patent Application No. 1011457.7, filed Jul. 7, 2010, allof which are incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of gesture based control ofelectronic devices. Specifically, the invention relates to computervision based hand gesture recognition.

BACKGROUND OF THE INVENTION

The need for more convenient, intuitive and portable input devicesincreases, as computers and other electronic devices become moreprevalent in our everyday life. A pointing device is one type of inputdevice that is commonly used for interaction with computers and otherelectronic devices that are associated with electronic displays. Knownpointing devices and machine controlling mechanisms include anelectronic mouse, a trackball, a pointing stick and touchpad, a touchscreen and others. Known pointing devices are used to control a locationand/or movement of a cursor displayed on the associated electronicdisplay. Pointing devices may also convey commands, e.g. locationspecific commands, by activating switches on the pointing device.

In some instances there is a need to control electronic devices from adistance, in which case the user cannot touch the device. Some examplesof these instances are watching TV, watching video on a PC, etc. Onesolution used in these cases is a remote control device. Recently, humangesturing, such as hand gesturing, has been suggested as a userinterface input tool, which can be used even at a distance from thecontrolled device. Typically, a hand gesture is detected by a camera andis translated into a specific command.

In the field of computer vision based control, separating the hand fromthe background, for example, from other moving objects in thebackground, is a yet unanswered challenge.

SUMMARY OF THE INVENTION

According to a first aspect of the invention there is provided a methodfor computer vision based hand gesture device control, the methodcomprising

receiving a sequence of images of a field of view, said imagescomprising at least one object;detecting movement of the at least one object in the images, to defineat least one moving object;applying a shape recognition algorithm on the at least one movingobject;confirming that the object is a user hand by combining information fromat least two images of the object; andtracking the confirmed object to control the device.

Combining information from at least two images of the object enables theidentification of a user hand even if a hand shape is undetectable or isnot easily confirmed (for example in the case of poor environmentconditions; poor lighting, background color, background movement, etc.).

Preferably, the method further comprises tracking the confirmed objectto identify at least one control gesture.

Further preferably the method comprises confirming that the object has ashape of a hand by combining information from at least two images of theobject.

Preferably the combined information comprises a shape affinity gradeassociated with said object in each image.

Preferably the combined information is a function of at least two shapeaffinity grades from at least two images.

Advantageously, combining information from at least two images of theobject comprises calculating an average of the shape affinity gradesfrom at least two images. Averaging shape affinity grades helps toobtain smoother, cleaner shape detection.

Preferably the method further comprises comparing the average to apredetermined threshold and confirming the object is a user hand if theaverage is above the threshold.

Further preferably, confirming that the object is a user hand comprisesdetermining that the movement of the object is a movement in apredefined pattern.

Yet further preferably the method comprises determining a shaperecognition threshold; and if the movement of the object is a movementin a predefined pattern then lowering the shape recognition threshold,such that the longer the movement is in a predefined pattern, the lowerthe shape recognition threshold is.

Preferably, the movement in a predefined pattern is a hand gesture.

Preferably, the gesture is a wave-like movement of the hand.

Further preferably the movement in predefined pattern is an initializinggesture and the initializing gesture and the at least one controlgesture are not the same.

Preferably, the at least one control gesture is to switch a system tocursor control mode.

Preferably, combining information from at least two images of the objectprovides a combined image and wherein the shape recognition algorithm isapplied on the combined image.

Preferably, combining information from at least two images comprisessubtracting one image from another to obtain a subtraction image.

Preferably the method further comprises

applying a contour detection algorithm on the subtraction image;applying the shape recognition algorithm on the detected contour; andcomparing the contour to a model of a hand contour.

Preferably the method further comprises confirming the object is a handonly if it is a hand with extended fingers.

Preferably, the method further comprises

-   -   receiving images of a field of view;    -   tracking the movement of more than one object in the images; and    -   comparing the movement of at least one of the objects to a        predefined movement pattern; and    -   if at least one object is moving in a predefined movement        pattern, applying shape recognition algorithms to confirm a        shape of a hand.

Preferably, tracking comprises selecting points of interest within oneimage in the sequence of images, detecting the location of the point inanother image and identifying points with similar movement and locationparameters.

Further preferably, the method comprises applying an edge detectionalgorithm to confirm the object is a user hand.

Yet further preferably, the method comprises deriving 3D informationfrom the received sequence of images and combining the 3D informationwith the information combined from at least two images.

According to a second aspect of the invention there is provided a systemfor user-device interaction, the system comprising:

an electronic device, an image sensor and a processor, the processor to:receive a sequence of images of a field of view;detect movement of at least one object in the images;apply a shape recognition algorithm on the at least one moving object;combine information from at least two images of the object to confirmthat the object is a user hand; andtrack the object to control the device.

Preferably, the processor is to track the object to identify at leastone control gesture and to generate a command to the device based on theat least one control gesture.

Additionally preferably, the image sensor is a 3D image sensor.

Also preferably, the system comprises one or more 2D image sensors. Theuse of 2D image sensors may be advantageous since 2D image sensors aretypically already available in many platforms such as mobile PCs, mobilephones, etc. No special, dedicated hardware (e.g., 3D camera) isnecessary.

BRIEF DESCRIPTION OF THE FIGURES

The invention will now be described in relation to certain examples andpreferred embodiments with reference to the following illustrativefigures so that it may be more fully understood. In the drawings:

FIG. 1A schematically illustrates a system operable according toembodiments of the invention;

FIG. 1B schematically illustrates a generalized method for devicecontrol, according to an embodiment of the invention;

FIG. 1C schematically illustrates a method for device control, accordingto one embodiment of the invention;

FIG. 1D schematically illustrates a method for device control, accordingto another embodiment of the invention;

FIG. 2A is a schematic illustration of a method for hand gesturerecognition based on statistical parameters of images according toembodiments of the invention;

FIG. 2B is a schematic flow-chart depicting a method for user-deviceinteraction including an image enhancement step, according to anembodiment of the invention;

FIG. 3 is a schematic illustration of a method for hand gesturerecognition in which the threshold for hand shape identification isadjustable, according to embodiments of the invention;

FIG. 4 is a schematic illustration of a method for identifying a handshape in an environment of a plurality of moving objects, according toan embodiment of the invention;

FIG. 5 is a schematic illustration of a method for computer vision basedhand gesture identification using symmetrical optical flow formulations,according to embodiments of the invention;

FIG. 6 is a schematic illustration of a method for computer vision basedhand gesture identification including the identification of both leftand right hand, according to embodiments of the invention;

FIG. 7 is a schematic illustration of a system operable according to oneembodiment of the invention;

FIG. 8 is a schematic illustration of a system operable according toanother embodiment of the invention;

FIGS. 9A-C schematically illustrate user aid tools according toembodiments of the invention; and

FIG. 10 is a schematic illustration of a remote control unit operableaccording to embodiments of the invention.

DETAILED DESCRIPTION OF THE INVENTION

According to an embodiment of the invention a system for user-deviceinteraction is provided which includes a device and an image sensorwhich is in communication with a processor. The image sensor obtainsimage data and sends it to the processor to perform image analysis andto generate user commands to the device based on the image analysis,thereby controlling the device based on computer vision.

According to embodiments of the invention the user commands are based onidentification and tracking of a user's hand. According to someembodiments a user hand is identified based on shape recognition of ahand shape, however, according to some embodiments, for example, inconditions in which shape recognition is difficult (e.g., in poorlighting conditions) a user hand may be identified based mainly onmovement detection. In all cases, the detection of a user's hand isconfirmed based on information obtained from combining two (or more)images.

Reference is now made to FIG. 1A which schematically illustrates system100 according to an embodiment of the invention. System 100 includes animage sensor 103 for obtaining a sequence of images of a field of view(FOV) 104. Image sensor 103 is typically associated with processor 102,and storage device 107 for storing image data. Storage device 107 may beintegrated within image sensor 103 or may be external to image sensor103. According to some embodiments image data may be stored in processor102, for example in a cache memory.

Image data of field of view (FOV) 104 is sent to processor 102 foranalysis. A user command is generated by processor 102, based on theimage analysis, and is sent to device 101. According to some embodimentsthe image processing is performed by a first processor which then sendsa signal to a second processor in which a user command is generatedbased on the signal from the first processor.

Device 101 may be any electronic device that can accept user commands,e.g., TV, DVD player, PC, mobile phone, camera, STB (Set Top Box),streamer, etc. According to one embodiment, device 101 is an electronicdevice available with an integrated standard 2D camera. According toother embodiments a camera is an external accessory to the device.According to some embodiments more than one 2D camera are provided toenable obtaining 3D information. According to some embodiments thesystem includes a 3D camera.

Processor 102 may be integral to image sensor 103 or may be a separateunit. Alternatively, the processor 102 may be integrated within thedevice 101. According to other embodiments a first processor may beintegrated within image sensor 103 and a second processor may beintegrated within device 101.

The communication between image sensor 103 and processor 102 and/orbetween processor 102 and device 101 may be through a wired or wirelesslink, such as through IR communication, radio transmission, Bluetoothtechnology and other suitable communication routes.

According to one embodiment image sensor 103 is a forward facing camera.Image sensor 103 may be a standard 2D camera such as a webcam or otherstandard video capture device, typically installed on PCs or otherelectronic devices. According to some embodiments, image sensor 103 canbe IR sensitive.

Image sensor 103 may obtain frames at varying frame rates. According toembodiments of the invention image sensor 103 obtains image data of anobject, e.g., a user's hand 105 when the object enters the field of view104. Typically, the field of view 104 includes hand 105 and abackground, such as a wall of a room in which the device and user arelocated, the user's shirt (for example, if the user is holding his handup against his body), a computer keypad (for example, if the imagesensor field of view is downward facing viewing a PC keypad), etc. Thebackground may include moving parts (such as people other than the userwalking through FOV 104, a window curtain being moved by a breeze,etc.).

Processor 102 can apply image analysis algorithms, such as motiondetection and shape recognition algorithms to identify and further trackthe object, typically, the user's hand. Sometimes, a movement,preferably a specific predefined gesture of a user's hand or hands, mustbe identified within a varying and sometimes moving environment.

Optionally, system 100 may include an electronic display 106. Accordingto embodiments of the invention, mouse emulation and/or control of acursor on a display, are based on computer visual identification andtracking of a user's hand, for example, as detailed below. Additionally,display 106 may be used to indicate to the user the position of theuser's hand within the field of view.

According to some embodiments of the present invention, processor 102 isoperative to enhance user hand gesture recognition.

A gesture, according to one embodiment, is a hand movement not directlyassociated with the resulting user command. For example, a repetitivemovement of the hand (e.g., wave like movement) may be a gesture.

System 100 may be operable according to methods, some embodiments ofwhich are described below.

Reference is now made to FIG. 1B which is a flow diagram schematicallyillustrating a generalized method for device control, according to anembodiment of the invention.

Image data of the field of view is received (110) and analyzed, forexample, by processor 102 or by two or more separate processors, todetect movement (112). If movement of an object is detected system 100applies shape recognition algorithms (113). Shape recognition algorithmsmay be applied together with or even before movement detection. Forexample, with or without confirmation of a hand shape a pre-determinedmovement pattern (e.g., a gesture) may be searched and if apre-determined movement pattern is detected for an object, the systemmay lower the threshold for determining that the object is a user hand.The step of applying a shape recognition algorithm on the object in animage may include or may be followed by a step of assigning a grade tothe object in each of two or more images. The grade may indicate theaffinity the object has to a model of a hand (typically a model having ashape of a hand). A shape affinity grade may indicate probability ofsimilarity to a model of a hand. Information is combined from two (ormore) images (114). For example, a combination of movement informationand shape information or a combination of shape information (for examplethe assigned affinity grades) from several images may assist inconfirming that a detected object is a user hand. The object isconfirmed to be a user hand based on the combined information (115).After the identification of a user hand by the system, the object(according to one embodiment, only an object which displayed movement ina pre-determined pattern) is tracked to further control the device 101,e.g., by detecting control gestures (116).

Reference is now made to FIG. 1C which is a flow diagram schematicallyillustrating a method for device control, according to one embodiment ofthe invention.

Image data of field of view 104 is received (130) and analyzed, forexample, by processor 102 or by two or more separate processors, todetect movement (132). If movement of an object is detected system 100assigns a shape affinity grade to the object in each image (133). Afunction (e.g., averaging, subtracting etc.) is applied to combine atleast two shape affinity grades (134). The object is confirmed to be auser hand based on the combined information (135). After theidentification of a user hand by the system, the object (according toone embodiment, only an object which displayed movement in apre-determined pattern) is tracked to further detect control gesturesand to control device 101 based on the control gestures (136).

Reference is now made to FIG. 1D which is a flow diagram schematicallyillustrating a method for device control, according to anotherembodiment of the invention.

Image data of the field of view is received (140) and analyzed, forexample, by processor 102 or by two or more separate processors, todetect movement (142). If movement of an object is detected the systemcombines information of at least two images to create a combined image(143). A shape recognition algorithm is applied to the combined image(144). For example, two images can be subtracted and detection of acontour can be applied on the subtraction image. The detected contourmay then be compared to a model of a hand contour shape in order toconfirm the object is a hand. In another example more than one shaperecognition algorithm is applied, e.g., both shape detection and contourdetection algorithms are applied substantially simultaneously on thesubtraction image.

The object is confirmed to be a user hand based on the combinedinformation (145). After the identification of a user hand by thesystem, the object (according to one embodiment, only an object whichdisplayed movement in a pre-determined pattern) is tracked to furtherdetect control gestures and to control the device 101 based on thecontrol gestures (146).

The image data received is typically a sequence of image frames, e.g. animage stream, and movement may be detected by comparing sequentialframes in the image stream.

A system according to some embodiments of the invention operates in afirst stage and a second stage; the first stage to detect, based on theimage analysis, an initializing or initialization gesture of the userhand and when the initializing gesture is identified, switching to thesecond stage, the second stage to detect a hand movement, based on theimage analysis, and to control the device based on the hand movement.According to one embodiment the initializing gesture and the handmovement are not the same.

According to one embodiment, the system uses a first initializinggesture in the first stage to switch to the second stage having a firstset of controls and a second initializing gesture in the first stage toswitch to the second stage having a second (and typically different) setof controls to achieve a new functionality of the device (e.g., a firstset of controls to control a media player mode and a second set ofcontrol to operate an application such as Power-Point™).

The initializing gesture may be a repetitive movement of the hand, suchas a wave-like movement or a clockwise or counter clockwise circularmovement of the user's hand. According to one embodiment theinitializing gesture is a gesture in which the hand is positioned on asurface parallel to the sensor and the gesture comprises movement of auser's hand with fingers extended.

The hand movement to control the device (second stage) may be anoperating gesture to generate a command to the device or a hand movementto control cursor movement. The operating gesture may include a linearmovement of the hand in a pre-defined direction or a movement of theuser's hand towards and away from the image sensor or a clockwise orcounter clockwise circular movement of the user's hand. Other gesturesand movements may be used.

Reference is now made to FIG. 2A which is a schematic flow diagramillustration of a method for hand gesture recognition based onstatistical parameters of images of a FOV, such as FOV 104. According toone embodiment depicted in FIG. 2A a method for user hand identificationmay be operable even in poor environment conditions when hand shaperecognition is not easily possible (e.g., in poor illuminationconditions). According to one embodiment the method includes receivingat processor 102 image data of a field of view (FOV) (210). According tosome embodiments image data may be received directly from storage device107 or from a cache memory. The image data is then analyzed, for exampleby processor 102 or by another processor. Motion detection algorithmsmay be applied to detect movement. Two consecutive frames may becompared or batches of images (e.g., 5 frames in a batch) may becompared.

If movement of an object within the FOV is detected (212), thenstatistical parameters of the image data are checked (213). Statisticalparameters may include illumination parameters. If the statisticalparameters are above a predefined threshold (214) (i.e., imagingconditions are good), shape recognition algorithms are applied (215),for example by processor 102, to identify the object as a hand (217). Ifthe statistical parameters are below the predetermined thresholdapplying shape recognition algorithms may be less practical. Thus, inpoor imaging conditions the step of hand shape recognition may includelowering a shape recognition threshold and a user hand is identifiedbased primarily on movement, specifically, if a predefined movementpattern is identified (216), hand identification can be determined (217)even with very low hand shape recognition.

Additional or other methods may be employed to determine handidentification in less than ideal imaging conditions. According to oneembodiment image enhancement techniques are applied after detectingmovement of the object. The image enhancement techniques may includemodifying imager parameters and/or modifying environment lightingconditions. The imager parameters may include at least one of the groupconsisting of contrast, gain, brightness, frame rate and exposure timeparameters.

According to another embodiment, schematically illustrated in FIG. 2B,the step of obtaining image data of the FOV (221) is followed by thestep of motion detection (222). If motion is detected, a specificpre-defined movement (e.g., gesture) is sought (223). If a gesture isidentified a step of applying image enhancement techniques (224) isperformed. Following the image enhancement steps an optional step ofapplying shape recognition algorithms to the image data may be isperformed to identify a hand (225). A user command is generated based onthe identified gestures or movements (226).

Image enhancement techniques include any suitable step to augment theimage data of the object in relation to the background image data. Thesetechniques may include modifications to image sensor parameters ormodifications of the environment. Modifications to imager parameters mayinclude increasing gain, contrast, exposure time, decreasing frame rateor zooming in on the identified hand to obtain a more detailed field ofview, to enable better differentiation between the object and thebackground. These parameters or other suitable image enhancementparameters of the imager may be increased to a pre-determined levelafter which shape recognition algorithms are applied (step 224) or theymay be gradually or dynamically increased, each increase depending onwhether the pre-determined level was obtained or not.

Modifications to the environment may include changing lightingconditions, for example by activating a dedicated illumination sourceand/or activating additional imagers or filters on the existing imagerto enhance imager sensitivity.

According to embodiments of the invention, environment lightingconditions may be altered to enhance differentiation between the object(preferably, the user's hand) and the background. For example, a usermay be watching a movie on a PC or TV or DVD or any other suitabledevice, in a low illumination environment. If the user wishes to input auser command to the device, while he is watching the movie, he may movehis hand in a specific gesture, in the camera FOV. The hand movement maybe detected but the illumination conditions may not enable shaperecognition of the hand, thus preventing the generation of a usercommand. To enhance recognition of the hand a light source, such as awhite LED or IR LED, may be activated in response to the detection ofmovement. The added illumination may provide sufficient conditions forshape recognition to occur. Non visible illumination, such as IRillumination, may have the added value of providing illumination to theimager without inconveniencing the user.

According to one embodiment, gesture detection may result in a temporarychange of the device screen parameters, such as increase of screenbrightness, to provide the required illumination.

According to other embodiments a dedicated image sensor, in addition tothe main image sensor (e.g., 103) may be activated in response togesture detection, for example, an IR range visual sensor may beactivated in low illumination conditions, and/or in response toactivation of the IR illumination source, to recognize the user's hand.According to additional embodiments a single imager may be used (e.g.,103) and an IR filter can be manually or automatically applied to thisimager to switch between visual light sensor and IR sensor.

According to some embodiments, a combination of image enhancementtechniques may be applied. Some techniques may be applied initially andothers added until a hand shape is recognized or until a predeterminedlevel of enhancement is obtained.

Once a hand is identified the object now identified as a hand may betracked and hand gestures may be recognized to further generate usercommands to control the device 101.

According to some embodiments of the invention identification of a userhand by the system 100 may be enhanced based on information gatheredover time or over a number of frames. For example, if an object movementis detected in a specific recurring pattern over a large number of imageframes, this may indicate that the object is not randomly moving butrather that the object is moving in a pre-defined pattern.

According to an embodiment schematically illustrated in FIG. 3 thethreshold for hand shape identification is adjustable based onadditional gathered information.

According to one embodiment image data, for example, data from twoconsecutive batches of images is received (310), for example, atprocessor 102, and movement is searched (312). If movement is detectedit is compared to a movement in a pre-defined pattern (313) and if themovement detected is in a pre-defined pattern, shape recognitionalgorithms are applied (314). The results of the shape recognitionalgorithms are compared to a pre-determined threshold (317), above whichhand shape detection is confirmed. If the comparison shows that theresults of the shape recognition algorithms did not reach thepre-determined shape confirmation threshold (317) the confirmationthreshold is lowered (316). Thus, the confirmation threshold may belowered the more an object moves in a movement of a pre-determinedpattern. Once the confirmation threshold is reached a user hand may beidentified (318). Following user hand identification the objectidentified as a hand may be tracked and control gestures may beidentified to control the device 101.

Hand shape confirmation may be achieved by grading detected shapeparameters based on their proximity to a hand shape model. A thresholdmay be set and if the grade is, for example, above a certain percentsimilarity the threshold may be considered reached.

According to some embodiments of the invention a hand is identified byapplying machine learning techniques.

According to one embodiment of the invention a continuous, rather thandiscrete, grading system has been developed in which features areassigned values which are compared to a threshold and a decisionregarding the validity of the feature (in the context of identifying ahand shape) is made based on the proximity of a value to the thresholdrather than determining if the value is above or below the threshold.

According to another embodiment deciding if an object is a hand is basedon determining the number of frames in which a hand shape is identified.If a hand shape is identified in a certain amount (or percentage) of theframes then it is confirmed that the object is a hand.

According to some embodiments of the invention the system 100 may beadapted to easily identifying a hand shape even within an environment ofa plurality of moving objects. According to one embodiment,schematically illustrated in FIG. 4, image data is received (401) andmovement is detected (402). In the case of several moving objects withinthe FOV all moving objects are tracked (404) and the movements arecompared to a movement of a pre-determined pattern (e.g., a gesture)(406). If the movement of at least one object is the pre-determinedmovement, hand shape recognition algorithms are applied to the imagedata (405). According to one embodiment shape recognition algorithms areapplied to the object identified as moving in the pre-determinedmovement. If none of the objects are moving in a pre-determined movementthe tracking of step 404 may be continued until a pre-determinedmovement is found.

Being able to identify a specific shape may be greatly aided bycombining information from a plurality of images. According to oneembodiment, the combined information can be a function of at least twoshape affinity grades from at least two images. For example, averagingshape affinity grades helps to obtain smoother, cleaner shape detection.For example, for a sequence of images an average may be calculated basedon grades given to objects identified with shape recognition algorithmsor based on their proximity to pre-assigned values or thresholds, asdescribed above. The threshold for shape recognition may be set based onthe calculated average, which is basically less sensitive to “noise”.Thus, an object whose calculated average shape recognition value(s) isabove the threshold will be considered a hand while objects that areunder the threshold are discarded. According to some embodiments thestep of tracking (404) includes tracking clusters of pixels havingsimilar movement and location characteristics in two, typicallyconsecutive images.

According to some embodiments of the invention methods of reverseoptical flow may be used. Optical flow relates to the pattern ofapparent motion of objects (or edges) in a visual scene, caused by therelative motion between the observer (e.g., camera) and the scene.Sequences of ordered images and tracking discrete image displacementsallow the estimation of motion.

According to some embodiments a method for identifying a gestureincludes tracking a point throughout a sequence of ordered images;calculating the reverse displacement of the point (the displacement of apoint from frame 2 to frame 1, as opposed to from frame 1 to frame 2);comparing the reverse displacement to a desired displacement and if thecomparison is below a pre-determined threshold, discarding the movement.

A method for computer vision based hand gesture identification using thereverse optical flow formulations, according to some embodiments, isschematically illustrated in FIG. 5. Image data is received (501).Typically, the image data includes image frames arranged in sequence. Ahand shape is detected (502) (for example, by applying a continuousgrading algorithm as described above) and points (pixels) of interestare selected (503) from within the detected hand shape area, theselection being based, among other parameters, on variance (pointshaving high variance are usually preferred). Movement of points isdetermined by tracking the points from frame n to frame n+1 (504). Thereverse optical flow of the points is calculated (the theoreticaldisplacement of each point from fame n+1 to frame n) (505) and thiscalculation is used to filter out irrelevant points (506). A group ofpoints having similar movement and location parameters is defined (507)and these points are tracked for further gesture recognition.

Using reverse optical flow methods according to embodiments of theinvention may improve the accuracy of movement detection.

Algorithms as described above may also be used to confirm the sidednessof a hand or if the palm or back of the hand is in view of the imagesensor. For example, features of a suspected hand shape may be comparedto left and right hand models and/or to palm and back of hand models andgraded according to their proximity to the model. The shape getting thehighest grade or having the most points with grades above a certainthreshold is determined to be the imaged hand (left or right and/or palmor back of hand).

Identification of an object as a left or right hand and/or as palm orback of hand may be employed according to embodiments of the invention,for example in recognizing a gesture. According to some embodiments twohands of a user may be used in a specific gesture, as opposed to asingle handed gesture. For example, a gesture of initialization (e.g.,to start operation of a specific program) may require two hands of asingle user. Another example of a two handed gesture is the use of twohands in gestures which manipulate a user interface such as in zooming awindow in/out or in rotating an object on a screen. According to oneembodiment, the system determines that a single user is using both handsonly when a right hand is located to the right of a left hand (or a lefthand located to the left of a right hand). According to someembodiments, additional parameters may be used by the system todetermine that a single user is using both hands, for example, only aleft and right hand of the same size may be determined to be two handsof the same user and/or only a left and right hand being distanced fromeach other no more than a pre-defined distance may be determined to betwo hand of the same user. Two hands of the same sidedness are notidentified as belonging to a single user. According to some embodimentsthe palm or back of a hand may be used to confirm a gesture. Forexample, a movement of a hand in a pre-determined pattern may beidentified as a specific gesture only if the palm (or back of hand) isfacing the image sensor.

According to other embodiments the left hand is used to generate a firstuser control command and the right hand is used to generate a seconduser control command. For example, the right hand may generate a mediaplayer mode command and the left hand—a presentation mode (e.g.Power-point from Microsoft). According to some embodiments differentfinger movements can generate different user commands. For example, thethumb may be moved in a defined pattern to generate a user command. Theuser command may include switching a system to cursor operating mode.According to some embodiments the thumb is moved in a defined pattern toemulate a button or mouse click.

According to some embodiments the method includes returning the cursorto the position it was in prior to the start of the thumb movement.

According to one embodiment, schematically illustrated in FIG. 6, amethod for computer vision based hand gesture identification includesreceiving image data of a field of view (701) and identifying a righthand and a left hand (702). Right and left hands may be identified byidentifying movement and then applying shape recognition algorithms,such as machine learning algorithms. Alternatively, shape recognitionalgorithms may be applied prior to identification of movement.

Once a left and a right hand are identified the movement of at least oneof the hands is tracked (703) and its movement is compared to apre-defined pattern (704). If the tracked hand movement is in apre-defined pattern the movement is identified as a gesture (705).

According to some embodiments the size of an object may be the basis fordetecting movement. For example, an object may be smaller the farther itis from the image sensor. Thus, relative position of points or group ofpoints determined to be within a hand shape (for example as describedwith reference to FIG. 5) in consecutive frames can be used as anindication of movement of a user hand along the Z axis. For example, ifpoints are getting closer to each other in consecutive frames, it may bedetermined that the object is getting smaller, namely, that the objectis moving away from imager 103.

Movement along the Z axis (the axis substantially along a line of sightof image sensor 103 or 805) may be used in gesture or movement patternrecognition. For example, as schematically illustrated in FIG. 7, ascreen 801 (or other user interface devices) may display icons 802 orsymbols, for example letter keys. A user's hand(s), an image of which iscaptured by an image sensor 805, may be identified and tracked, forexample as described above, and the position of the user's hand may becorrelated to a specific icon 802′. This correlation may result in somesort of change in the icon 802′ appearance. For example, the icon 802′may become bold or may be illuminated to indicate selection of thaticon. The user may choose/activate (e.g., simulate a key pressingaction) the icon 802′ by moving his hand closer to screen 801(essentially moving the hand along the Z axis in relation to the imagesensor 805).

According to one embodiment a user interface may signal to a user whenhis hand movement has exceeded the limits of the FOV. For example, asschematically illustrated in FIG. 8 a user's hand 911 may be tracked andits position displayed on a screen 910. If the hand is approaching thelimit of the image sensor's field of view (not shown) an icon 913 mayappear and a picture of a hand may be shown to the right of the iconsignaling to the user to move his hand back into the field of view(typically to the left).

Gestures recognized by the system and used to generate user commands toa device typically include hand movements in a specific pattern.According to some embodiments, a special gesture component should beused for indicating the end of a gesture so that the tracking of thehand does not continue after identifying the special gesture component.For example, closing the hand fingers in a first may signal the end of agesture. According to some embodiments if a hand motion is identified asa specific pre-defined gesture then a user aid tool is displayed, forexample on display 106 (FIG. 1).

According to one embodiment, for example, as illustrated in FIG. 9 A,the user aid tool is a legend displayed on monitor 106. The legend maygraphically or otherwise present the list of hand gestures and theirresult for user reference. Other legends and menus may be presented.According to one embodiment, the legend may be dynamic and applicationspecific. For example, if a word processing application is being run ona PC, a specific hand gesture may call up a legend relevant to the wordprocessing application.

Another aid tool is schematically illustrated in FIG. 9 B. The user aidtool may be a direction tool which may, for example, include a graphicaldisplay appearing on monitor 106, of possible directions in which theuser may move his hand and the command activated by each direction.

According to yet another embodiment a user aid tool is provided toensure continuous, fluent use. According to one embodiment, asschematically illustrated in FIG. 9 C, a video of the tracked hand isdisplayed on display 106 so that the user may have feedback regardinghis hand gestures. The image of the user's hand or a graphicrepresentation of the user's hand may be displayed within a frame 60which depicts the boundaries of the FOV of the camera, such that theuser may be warned if his hand is in the proximity of the FOV boundary.According to some embodiments, another indication to the user may beactivated (for example, an audio signal) if the user's hand isapproaching the image sensor field of view boundaries, so as to avoidlosing track of the hand.

According to one embodiment a raw video or slightly processed video ofthe user's hand may be presented to the user. According to oneembodiment the graphical aid tool is superimposed on the video of theuser's hand. According to one embodiment the graphical aid tool showsbuttons and the user may move his hand over the graphically displayedbuttons to operate the buttons. According to some embodiments a specificarea or point may be marked on the image of the hand to indicate thepoint which may activate the displayed buttons.

The user aid tools according to embodiments of the invention may bedisplayed to a user in a non-obstructive way, for example, in an unusedcorner of a device monitor or as a semi transparent display on thedevice screen.

According to some embodiments user hand gestures may control a displayedobject movement, such as cursor movement and operation. According to oneembodiment a system includes a forward facing camera and a processor tonormalize cursor (or other displayed object) movement in accordance withthe size of the user hand relative to the image frame which includes thehand (which is an indication of the distance of the hand from thecamera). Thus, cursor (or object) movement may be linked to the actualhand displacement rather than to the hand angular displacement.

According to one embodiment a specific gesture may be used to move froma mode of operating gestures (where the system accepts user commandsbased on hand gestures) to a mode of cursor operation (where the systemis controlled by a user interface device, such as a keyboard). Forexample, the specific gesture may be a movement of the thumb closer orfurther away from the base of the hand.

According to some embodiments, while operating in “cursor operatingmode”, a specific gesture may be used for releasing the cursor control.For example, while moving a cursor on a display using hand movement, thehand may reach the boundaries of the screen. A specific gesture (e.g.,folding the fingers into a first) may allow the user to temporarilydisconnect from the cursor, move the hand back into the screen area andresume cursor control.

According to some embodiments, a hand gesture may be used to select andmanipulate an object displayed on a screen (e.g., icon). For example,bringing a thumb close to a finger in a “tweezers movement” around theobject on the screen may capture the object and movement of the hand maydrag the object on the screen in accordance with the hand movement.Movement of the hand may also provide movement of an object on the zaxis (e.g., shrinking and enlarging the object) or rotating the hand torotate the object. Another such gesture may include bringing the tips ofall fingers together such that they touch each other or bringing thetips of all fingers close to each other such that they are like holdinga valve, the object being the “valve”. Another gesture may includeclosing all the fingers of the hand to a first shape. Opening the handmay be used to release the object so that hand movements no longermanipulate the object.

According to some embodiments a specific gesture (such as bringing thetips of all fingers close to each other such that they touch each other)may act like touching a touch screen at the location of the cursor. Thisgesture may be used to emulate a click if the hand returns quickly toits original shape, double click if “clicking” twice, or dragging anobject on the screen causing it to follow further movement of the handwhereas movement of the hand without “grabbing” the object will not moveit. Thus, gestures using two hands to rotate zoom or otherwisemanipulate a window or object on a screen may be active only once a“grab” gesture of at least one of the hands has been performed.

According to some embodiments a method for manipulating an objectdisplayed on a device screen, is provided. According to one embodimentthe method includes identifying an initializing gesture; identifying anoperating gesture, said operating gesture to select an object; trackingthe operating gesture; and moving the object in accordance with thetracked gesture. Thus, according to some embodiments the operatinggesture may include bringing a thumb close to a finger in a “tweezersmovement”, bringing the tips of all fingers closer to each other ortogether such that they are touching or creating a first shape.

According to further embodiments of the invention a method foruser-device control, includes identifying in an image a first part ofthe hand and second part of the hand; tracking across a plurality ofimages the first part of the hand and generate cursor movement controlbased on the tracking of the first part of the hand; and tracking acrossthe plurality of images the second part of the hand and generating usercommands based on the tracking of the second part of the hand. The firstpart of the hand may include at least the hand base and the second partof the hand may include at least one finger.

According to one embodiment identifying the second part of the handcomprises differentiating between a thumb and other fingers. Accordingto one embodiment the method includes differentiating between a lefthand and a right hand based on identification of location of the thumbin relation to other fingers.

By separately tracking the base of the hand from one or more fingersand/or being able to differentiate the thumb from the other parts of thehand, cursor movement control and button click emulation can be providedconcurrently with the same hand without interfering with each other.According to one embodiment a button or, for example, mouse left clickcommand is generated by a movement of the thumb in relation to otherparts of the hand (e.g., thumb moving towards or away from base of handor from other fingers). Thus, finger movements and postures can beperformed without affecting cursor position and movement. However, insome embodiments, in order to avoid unintentionally moving the cursorwith a hand gesture, upon detection of a gesture, the system mayautomatically return the cursor to its position prior to the beginningof the gesture.

According to some embodiments the method includes normalizing cursorsensitivity to the size of the hand in the image.

Embodiments of the invention further present a computer vision basedremote control unit. According to embodiments of the invention acomputer vision based remote control unit is provided which includes aprocessor and an interface to connect to a device with a graphicalscreen, said processor to: receive images of a field of view; detect auser hand from the images; track the user hand to identify a controlgesture; generate a remote control command based on the identifiedcontrol gesture; and transmit the command to a remote device; anddisplay on the device graphical information relating to remote devicecontrol options.

The remote control unit may include an IR LED wherein the transmissionof the command to the remote device is by using the IR LED. Typically,the transmission uses standard transmission protocol of remote control.

According to some embodiments the remote device is a TV, STB or a DVDplayer.

According to some embodiments the processor is to choose a remote devicebased on the identified control gesture and the processor is to displaygraphical information specific to the chosen remote device. Thegraphical information may include application specific user interfaceinformation.

A schematic illustration of a remote control unit operable according toembodiments of the invention is presented in FIG. 10. A system operableaccording to embodiments of the invention may include a processor 1102which is in communication with an image sensor such as camera 1103 andwith a user interface device 1106 and a remotely controlled device 1101.

The camera 1103 obtains images of a field of view (FOV) 1104 whichincludes a user's hand 1105. The processor 1102 detects a user hand (forexample, as described above) and generates a remote control commandbased on the hand gesture. The remote control command is transmitted todevice 1101 (which may be for example a TV, STB, DVD player, etc.). Thedevice 1101 may include a graphic interface 1106, such as a monitor orscreen. The remote control unit may include an IR LED (not shown)wherein the transmission of the command to the device is by using the IRLED. The transmission typically uses standard transmission protocol ofremote control.

According to some embodiments the remote control unit may include aconnector 1122 to connect to electronic device 1101 and may be inwireless communication with another electronic device 1101′. Graphicinformation relating to the control of device 1101 as well as device1101′ may be displayed on graphic interface 1106.

The connector 1122 to may be any suitable wired or wireless connection,such as a USB connection.

According to some embodiments graphical information may be displayed onthe graphic interface 1106, such that commands may be transmittedwirelessly to remote devices while graphical information is available tothe user. The graphical information, which is typically displayed basedon detected user hand gestures, may relate to remote device controloptions, e.g., a menu showing all the accessible remote devices (e.g.,1101 and 1101′) may be displayed so that a user may choose, typicallywith hand gestures, a remote device he wishes to operate. Other windowsshowing additional control options may be displayed to the user. Forexample, each device 1101 and 1101′ may have different control windows1106 a and 1106 b, both displayed on a single graphic interface, such as1106. Each window 1106 a and 1106 b may have a different controlinterface. For example, device 1101 may be a TV and device 1101′ may bea DVD player or a STB. According to one embodiment the TV may beconnected by USB to a camera and processor unit and a STB or otherdevice is wirelessly connected to the processor unit. According to otherembodiments both connections may be wireless or the connection to the TVmay be wireless while the connection to the other device may be througha USB or other electric connector.

According to one embodiment windows 1106 a and 1106 b are both displayedon the TV display but while window 1106 a shows TV control commands(such as volume, change channels, etc.), window 1106 b shows commandsfor playing a DVD (such as stop, rewind, play) or commands to operate aSTB.

Once a specific device is chosen by the user the device or applicationspecific window may be displayed on top of the other windows, which maybe faded out while the chosen device window is bold.

Each device may have a different user aid tool displayed in differentwindows. A user aid tool may be, for example, a legend graphically orotherwise presenting the list of hand gestures and their result for userreference. The legend may be application and/or device specific.

This remote control unit according to embodiments of the inventionenables control of multiple devices without the user having to move fromdevice to device. A user standing in one place may activate and controlseveral devices at once.

1. A computer vision based apparatus comprising: a processor, saidprocessor to: detect a shape of a hand from a sequence of images;recognize a gesture of the hand; generate a remote control command basedon the recognized gesture; and transmit the command to a device.
 2. Theapparatus according to claim 1 wherein the processor is to track thehand within the sequence of images to recognize the gesture.
 3. Theapparatus according to claim 1 comprising a camera to obtain thesequence of images, said camera in communication with the processor. 4.The apparatus according to claim 1 wherein the processor is to displayon a graphical interface information relating to remote device controloptions.
 5. The apparatus according to claim 1 comprising an IR LEDwherein the transmission of the command to the device is by using the IRLED.
 6. The apparatus according to claim 1 wherein the device is a settop box or streamer.
 7. A method for computer vision based remotecontrol of a device, the method comprising: using a processor to detecta shape of a hand from at least two images; recognize a gesture of thehand; based on the recognized gesture, transmit a control command to aremote device.
 8. The method according to claim 7 wherein recognizing agesture of the hand comprises detecting the shape of the hand.
 9. Themethod according to claim 7 wherein recognizing a gesture of the handcomprises detecting movement in the at least two images.
 10. The methodaccording to claim 9 wherein the movement is in a pre-determinedpattern.
 11. The method according to claim 7 wherein transmitting acontrol command is performed using an IR LED.